Dimodal Utility Functions: Dimodal Utility Functions

Description

Miscellaneous functions for working with Dimodal results.

Usage

midquantile(x, q=((1:length(x))-1)/(length(x)-1), type=0L, feps=0.0)
runs.as.rle(runs, x)
select.peaks(pk)
center.diw(m)
match.features(m, near=10, foverlap=0.70, nomatch=NA_integer_, quiet=FALSE)
shiftID.place(feat, offset, xmid, midoff)

Value

midquantile returns a vector the same length as the data. Quantiles outside the range [0,1] return the first or last data point, even if this is discontinuous with the values at 0 or 1. In other words, the function does not follow the piecewise linear segment outside the valid range, but clips it. NA or NaN quantiles propagate.

runs.as.rle returns a list of class "rle" with members "lengths" and "values", as per the rle command. It also adds a member "nskip" with the number of non-finite values in the data within the run.

select.peaks returns a subset of the argument, possibly with zero rows. If the argument is not a "Dipeak" object, it returns a dummy empty object.

center.diw returns its argument with modified diw.peaks and diw.flats, if they exist.

match.features returns a list with four elements. "peak.lp2diw"

is a vector with one element per row in lp.peaks whose value is the matching row number in diw.peaks, or nomatch if there is no match or the lp.peaks row is not a valid peak. "peak.diw2lp" is a similar map from diw.peaks to lp.peaks. "flat.lp2diw" and "flat.diw2lp" are the equivalent maps for flats.

shiftID.place returns the modified feat data frame.

Arguments

m: a "Dimodal" object returned from Dimodal
x: the original data, with the same length as the members of runs
runs: the list returned from find.runs
pk: a "Dipeak" object
near: maximum distance in points between matching peaks, or as a fraction of the length of the original data
foverlap: minimum fraction of the length of either flat that the common segment must cover
nomatch: value to use when a feature has no match in the other spacing, treated as integer internally
quiet: a boolean, TRUE to only determine the matching, FALSE to also print the aligned features
q: quantile(s) for mid-quantile approximation, by default at the data indices
type: algorithm determining segments approximating x, an integer from 0 to 4 as described in Details
feps: tolerance for matching values, per find.runs
feat: a "Dipeak", "Diflat", or "Dicpt" data frame
offset: an integer, the amount to shift position of peaks or points or endpoints of flats
xmid: a vector of interpolated quantiles to convert indices back to raw data, as stored in the "Didata"
midoff: an integer, the amount to shift positions in addition to offset

Details

The midquantile function approximates the quantile function by replacing the steps of the ecdf distribution with piecewise linear segments; see Ma, Genton, and Parzen (2011). This creates a ramp over tied or discrete values, giving a better estimate of the position of features, especially when there are large gaps between modes and few or no data points within them. The function determines the segment endpoints and by default evaluates them on the original data grid, scaling the vector indices to run from 0 through 1. It first converts the data to runs using find.runs, with feps defining ties. Segmentation type 1 is the mid-distribution function of Ma, with the data value at the ends of runs shifted to the middle of the change. Segmentation type 2 instead shifts the quantiles by half an index, extending the step in the ecdf. These two approaches can create an envelope around the quantile function, with the type 1 offset from the data at q = 0 and the type 2 at q = 1. Segmentation type 3 combines both shifts, interpolating on a half grid for both x and q. It follows the quantile function better, but does round off the curve at single data points. In practice types 1 and 3 are close. Type 4 runs segments between the middle of runs, or through the data points when there are none. This reduces its estimation error, but the strategy does assume that the step in data to either side of the run is about the same. If not, the other approximations would move away from the center of the run.

Type 4 is best when the data has very few ties. Use types 3 or 1 when there are. Type 0 will automatically select the strategy, using 3 when there are ties and 4 when not. It uses a simple check, whether the number of unique values is a tenth of the data, to decide if there are enough ties.

Internally the function makes two calls C-side. .Call("C_midq", x, type, feps, PACKAGE="Dimodal") returns a vector with the piecewise linear segments, with $x the endpoints along the data and $q along the quantiles. .Call("C_eval_midq", pts$x, pts$q, q, PACKAGE="Dimodal") uses these segments as the first two arguments and new quantiles as the third to interpolate data values.

The find.runs returned value has two vectors with the length of the data. One has non-zero values at the start of runs, the other counts skipped invalid points. The "rle" class is more compact, storing only the runs and the data values at the start. The runs.as.rle function does this compaction.

find.peaks returns a data frame with not only maxima but also the minima between them. It includes maxima even if they are at the first or last point, with minima to only one side. select.peaks selects only those peaks surrounded by minima. It may return a "Dipeak" object with no rows. pk need not include the modifications from Dimodal; select.peaks keeps all columns of its argument.

Indexing in interval spacing is at the end of the interval but the low-pass filter is centered. center.diw shifts the interval spacing features to align with the data, including peak positions, flat source identifiers, and flat start and end points. Note that the raw value is already shifted when set by Dimodal and will not change.

match.features aligns peaks and flats between the low-pass and interval spacing. It compares only valid maxima, as per find.peaks, and shifts interval spacing positions with center.diw before matching them. Peaks must lie within near points to match. Flats must overlap, and the common segment must be at least foverlap of the length of either flat. The function prints the position, raw value, and the number of tests that have passed their acceptance level, unless quiet is TRUE. The nomatch value is cast internally to an integer and cannot be between 1 and the number of features in either spacing, to prevent conflicts. NA, 0, and negative values are acceptable.

The shiftID.place function is used in Dimodal to modify the placement of features, and is provided separately if the detectors find.peaks, find.flats, or find.cpt are called directly. It adds offset to the columns "pos", "stID", "endID", "lminID", "rminID", "lsuppID", and "rsuppID" if they exist in the features data frame to account for values skipped during filtering. Use the "lp.stID" attribute for low-pass features, "diw.stID" for interval spacing, and 2 for changepoints. If pos, stID, or endID are in the data frame the function also adds columns "x", "xst", and "xend" respectively with the original data value for the index by using the midquantile result xmid. Here the index is further modified by midoff; use 0 for low-pass features and changepoints, and half the interval width, stored as attribute "wdiw".

References

Y. Ma, M. Genton, E. Parzen (2011), Asymptotic properties of sample quantiles of discrete distributions. Ann Inst Stat Math 63, pp. 227--243.

Examples

Run this code


m <- Dimodal(faithful$eruptions, Diopt.local(analysis=c('lp','diw')))
# How many peaks were found?  Use print.data.frame to see the full structure.
nrow(select.peaks(m$lp.peaks))
nrow(select.peaks(m$diw.peaks))
# Compare to m$diw.peaks.
m$diw.peaks
center.diw(m)$diw.peaks
# Flats do not match because the Diw feature only covers 50% of the LP.
match.features(m)

plot(sort(iris$Petal.Length))
lines(midquantile(iris$Petal.Length, type=1L), col='red')
lines(midquantile(iris$Petal.Length, type=2L), col='blue')
lines(midquantile(iris$Petal.Length, type=3L), col='green')
lines(midquantile(iris$Petal.Length, type=4L), col='orange')

# See the Dimodal.R source code for the use of shiftID.place.

# To simplify the runs in the signed difference of the interval spacing
# runs.as.rle(Dimodal:::find.runs(m$data['signed',], 0.01), m$data['signed',])

Run the code above in your browser using DataLab